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Given  a  multi-valued  mapping  F,  we  address  the  problem  of  finding 
another  multi-valued  mapping  H  that  agrees  locally  with  F  in  some  sense 
We  show  that,  contrary  to  the  scalar  case,  introducing  a  derivative  of  F  is 
hardly  convenient.  For  the  case  when  F  is  convex-compact-valued,  we  give 
some  possible  approximations,  and  at  the  same  time  we  show  their  limitations. 
The  present  paper  is  limited  to  informal  demonstration  of  concepts  and 
mechanisms.  Formal  statements  and  their  proofs  will  be  published  elsewhere. 
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SIGNIFICANCE  AND  EXPLANATION 


This  paper  is  concerned  with  extension  of  the  concept  of  derivative  from 
functions  to  multi-valued  mappings.  Proper  definitions  of  such  extensions  are 
useful  to  solve  inclusions,  and  more  specifically  to  minimize  convex 
functions.  Simple  examples  are  given  to  show  the  difficulties,  and  some 
proposals  are  made  to  overcome  them. 
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1 .  INTRODUCTION 

Consider  first  the  problem  of  solving  a  nonlinear  system: 

f(x)  =  O  (1) 

where  f  is  a  vector -valued  function.  If  we  find  a  first  order  approximation 
of  f  near  x,  i.  e.  a  vector-valued  bi-function  h  such  that 

h (x ; d )  =  f  (x+d)  +  o  (d)  (2) 

(where  o  (d )  / 1 d II  -*  0  when  d  -*  0)  then  we  can  apply  the  Newton  principle: 
given  a  current  iterate  x,  solve  for  d 

h(x; d)  =  0  (3) 

(supposedly  simpler  than  (1))  and  move  to  x  +  d. 

Everybody  knows  that  if  f  is  differentiable  and  if,  in  addition  to  sat¬ 
isfying  (2),  h  is  required  to  be  affine  in  d,  then  it  is  unambiguously 
def ined  by 

h (x;d)  :=  f  (x)  +  f '  (x)d  (4) 

Merging  (2)  and  (4)  and  subtracting  f  (x)  gives  also  a  nonambiguous  defi¬ 
nition  of  f'  (the  jacobian  operator  of  f)  by: 

f '  (x)  d  :=  f  (x+d)  -  f  (x)  +  o  (d)  . 


Suppose  now  that  we  have  to  solve 

0  €  F  (x)  (5) 

where  F  is  a  multi-valued  mapping,  i.  e.  F(x)  c  Rn.  A  possible  application 
of  (5)  is  in  nonsmooth  optimization,  when  F  is  the  (approximate)  subdiffer¬ 
ential  of  an  objective  function  to  be  minimized.  To  apply  the  same  principle 
as  in  the  single  valued  case,  F(x+d)  must  be  approximated  by  some  set 
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H (x;d)  c  Rn.  Continuing  the  parallel  and  requiring  H  to  be  affine  in  d 
(whatever  it  means),  we  must  express  it  as  a  sum  of  two  sets:  H(x,d)  =F(x)  +  G. 


In  summary, 
enough: 

we  want  to 

find  a  set  G  such  that, 

for  all  c  >  0  and  |d|  small 

F  (x+d) 

c  F  (x )  +  G 

+  Eld II  U 

(6. a) 

and 

F  (x)  + 

G  c:  F  (x+d) 

+  c Id  1  U 

(6.b) 

where  U  is  the  unit  ball  of  Rn.  Unfortunately,  such  a  writing  is  already 
worthless.  First,  it  does  not  help  defining  the  "linearization"  G:  just 
because  the  set  of  subsets  is  not  a  group,  F(x)  cannot  be  substracted  in  (6). 
Furthermore,  (6)  is  extremely  restrictive:  for  n  =  1,  consider  the  innocent 
mapping  F (x)  :=  [0,3x]  (defined  for  x  *  0) .  Take  x  =  1,  E  =  1  and  d  <  0. 

It  is  impossible  to  find  a  set  G  satisfying  (6.b),  i.  e.  [0,3]  +  6c  [d,3+2d] . 
For  example,  G  =  {d }  is  already  too  "thick". 

A  conclusion  of  this  section  is  that  a  first  order  approximation  to  a 
multivalued  mapping  cannot  be  readily  constructed  by  a  standard  lineari¬ 
zation;  the  definition  of  such  an  approximation  is  at  present  ambiguous. 

For  a  deep  insight  into  differentiability  of  sets,  we  refer  to  [6]  and  its 
large  bibliography.  Here,  for  want  of  a  complete  theory,  we  will  give  in 
the  next  sections  two  possible  proposals.  None  of  them  is  fully  satisfactory, 
but  they  are  rather  complementary,  in  the  sense  that  each  one  has  a  chance 
to  be  convenient  when  the  other  is  not.  We  will  restrict  ourselves  to  the 
convex  compact  case.  Furthermore,  as  is  usual  in  nondifferentiable  optimi¬ 
zation,  we  will  consider  only  directional  derivatives.  Therefore  we  adopt 
simpler  notations:  x  and  the  direction  d  being  fixed,  we  call  F(t)  the  image 
by  F  of  x  +  td,  t  o.  we  say  that  H  approximates  F  to  1st  order  near  t  *  o+ 
if  for  every  E  >  O,  there  is  6  >  0  such  that  t  €  [0,6]  implies 

F  (t)  c:  H  (t)  +  Et  U  and  H(t)  C  F(t)  +  EtU  (7) 

Note  that,  among  others,  F  approximates  itself! 


2.  MAPPINGS  DEFINED  BY  A  SET  OF  CONSTRAINTS 

As  a  first  illustration,  suppose  F  is  defined  by: 

F(t)  :={z€Rn|c_.(t,z)«0  for  j  =  1 , . . .  ,m) 

where  the  "constraints"  c.  are  convex  in  z.  Assume  the  existence  of  c'.  (0,z)  , 
the  right  derivative  of  c_.(*,z)  at  t  =  0  (cj.  (0  ,  z)  would  be  more  suggestive). 
Then  it  is  natural  to  consider  approximating  F (t)  by 


H  (t)  :=  {z  lc.(0,z)  +  t  cMO,z)  <  0  for  j  =  1 , . . .  ,m}  . 


(8) 
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An  algorithm  based  on  this  set  vrould  then  be  quite  in  the  spirit  of  [7], 

It  is  possible  to  prove  that  the  H  of  (8)  does  satisfy  (7),  provided 
some  hypotheses  hold,  for  example 

(i)  [Cj(t,  z)  -  Cj(0,z)]/t  -►  c  ^  (0,  z)  uniformly  in  z,  when  t  +  0, 

(ii)  there  exists  z  such  that  c.(0,z  )  <0  for  j  = 

o  jo 

A  weak,  point  of  (8)  is  that  it  is  highly  non-canonical .  For  example,  per¬ 
turbing  the  constraints  to  (1  +  a  .  t)  c  .  (t,  z)  gives  the  same  F  but  does 
change  H.  ^  3 

3.  A  DIRECT  SET-THEORETIC  CONSTRUCTION 

If  we  examine  (6)  again,  we  see  that  there  would  be  no  difficulty  if 
F(x)  were  a  singleton:  then  (6)  would  always  be  consistent  because  F(x+d) 
would  never  be  less  thick  than  F(x),  and  F(x)  could  be  subtracted.  This 
leads  to  differentiating  F  at  an  arbitrary  but  fixed  y  €  F (O) .  Define 

f  I  there  exist  t  and  y  6  F(t  )  for  n  €  IN 
F*  (O)  :=  I  z  I  n  n  n  J 

y  (.  with  t  1 0  and  (y  -y)  /  t  -»  z 

n  n  1  n 

or,  in  a  set-theoretic  notation  (see  [2],  Chapter  VI): 

F'  (0)  :=  lira  sup  [F(t)-y]/t 
y  t-J-0 

This  set  is  called  the  contingent  derivative  in  [l],  the  (radial)  upper  Dini 
derivative  in  [6]  and  the  feasible  set  of  first  order  in  [3].  We  refer  to  [l] 
for  an  extensive  study  of  F' ,  but  some  remarks  will  be  useful: 

a)  Fy(0)  depends  on  the  behaviour  of  F  near  y  only.  If  we  take  an  arbi¬ 
trary  a  >  0  and  set  G(t)  :=  F(t)  fl  {y  +  au  },  then  GMO)  =  FMO)  . 

b)  If  F (t)  *  F  (0)  does  not  depend  on  t,  F'  (0)  is  just  the  tangent  cone 

to  F (0)  at  y.  y 

c)  Let  A  be  a  convex  set  in  Rn,  and  f  :  [o , 1 ]  *— *  Rn  a  differentiable 

mapping  (with  f  (0)  =  0  for  notational  s  im  pi  icity) .  Consider  F(t)  :={f(t)}  +  A. 
Given  y  €  F (0)  =  A,  call  T^  the  tangent  cone  to  F (0)  =  A  at  y.  Then  it  can 

be  shown  that  F1  (0)  =  {f1  (0)  }  +  T  .  This  is  the  situation  when  F  is  the 

y  y 

approximate  subdifferential  of  a  convex  quadratic  function  (see  [4]). 

d)  Let  n  =  2.  Given  r  €  R,  consider  F(t)  :=  P(t)  D  U  with  the  halfspace 
P (t)  :=  {y  =  (yj , I  ^2  *  r  y^  •  It  can  be  shown  that,  for  y  =  0  €  F (0)  , 

F' (0)  =  {z  =  (z,,z_)  I  z.  >  o};  F' (0)  is  the  same  as  it  would  be  if  r  were  0 
o  12  2  o 

(in  which  case  F(t)  would  be  fixed),  and  does  not  predict  the  rotation  of 
F (t)  around  y  =  0. 


Because  a  convex  set  is  the  intersection  of  the  cones  tangent  to  it, 
our  remark  b)  above  suggests  to  approximate  F(t)  by 

H  (t)  :=  fl  {y  +  tF'(O)  I  y  £  F  (0)  }  (9> 

Of  course,  this  will  be  possible  only  under  additional  assumptions  (not  only 
due  to  the  multi-valuedness  of  F;  for  example  F  (t)  :  =  {t  sin  log  t }  has 

F(0)  =  {0},  F^<0)  =  [-l,+l]  and  H(t)  =  [-t,+t]). 

Before  mentioning  the  assumptions  in  question,  we  introduce  another 
candidate  to  approximate  F:  for  p  £  Rn,  denote  by  s  (t)  :*  sup  {<p, y> |y £  F (t) } 
the  support  function  of  F(t).  It  is  known  that  F  can  be  described  in  terms 
of  s,  namely  F(t)  =  {y  |  <p, y>  <  s  (t)  V  p  £  Rn}.  Then,  if  s  has  a  (direc¬ 
tional)  derivative  s^(0),  the  following  set  is  natural  (see  [5]): 

G(t)  :=  {y  I  <p,  y>  <  s  (0)  +  t  s’  (O)  V  p  £  Rn} .  (lO) 

P  P 

To  assess  these  candidates  (9)  and  (10),  the  following  assumptions  can 
be  considered: 

(i)  [s  (t)  -  s  (0)  ]  /  t  -»  s'  (0)  uniformly  for  p  £  U,  when  t  4-  0; 

P  P  P 

(ii)  F (0)  has  a  nonempty  interior. 

They  allow  to  prove: 

If  (i)  holds,  then  H(t)  =  G(t);  if  (ii)  also  holds,  then  (7)  holds. 

We  remark  that  (i)  alone  suffices  to  prove  the  second  half  of  (7),  which 
is  the  important  one  for  (5)  (solving  0  £  H(t)  gives  some  among  the  possible 
Newton  iterates);  however  H(t)  may  be  void  if  (ii)  does  not  hold.  It  is  also 

interesting  to  remark  that,  if  s' (O)  is  assumed  to  be  convex  in  p  (in  which 

P 

case  (ii)  is  not  needed),  then  it  is  the  support  function  of  a  convex  set 
that  we  are  entitled  to  call  F'  (O)d  because  there  holds  H(t)  =  F  (0)  +  t  F'  (0)d 
(due  to  additivity  of  support  functions) .  In  other  words,  convexity  of  s'  (0) 
gives  the  "easy"  situation  in  which  (6)  holds.  ^ 

The  role  of  assumption  (i)  is  more  profound.  It  is  natural  to  require 
that  F^ (0)  does  predict  the  behaviour  of  F(t)  near  y;  this  behaviour  is 

trivial  when  y  £  int  F (0)  (then  F (t)  must  contain  y  for  all  t  small  enough); 
if  y  is  on  the  boundary  of  F (0)  then  there  is  a  normal  cone  N^(0)  to  F(0) 

at  y,  and  s^(0)  =  <p, y>  for  p  £  N^(0);  hence  the  behaviour  of  F (t)  near  y  is 

naturally  related  to  the  behaviour  of  s  (t)  for  these  normal  p’s  (inciden- 

P 

tally,  a  key  result  is  that  F^(0)  =  {z  I  <p, z>  «  s' (0)  V  p  £  N  (0) };  (i)  is 

essential  for  this) .  However,  it  is  not  only  some  technicalities  in  the 
proof  that  require  the  uniformity  stated  in  (i),  but  rather  the  deficiency 
of  F'  suggested  by  d)  above:  consider  the  innocent  mapping 

F(t)  :=  {y  =  (y j , Y2 )  I  0  «  <  1,  tyt  «  y2  ^  1}. 

-4- 


Given  a  €  R  and  p  =  (a, -1 ) ,  s (t)  =  max  { (a-t) yj  I  O  <  4  1 }  and  thus, 

(i)  is  violated:  when  a  4  O,  s' (O)  jumps  from  -1  to  0.  For  this  example, 

H (t)  =  G(t)  =  [0, 1  ]  *  [t,l],  which  is  a  poor  approximation  of  F(t).  This  is 
rather  disappointing,  but  observe  that  Section  2  is  well-suited  for  the 
present  F. 
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